Extending CCGbank with Quotes and Multi-modal CCG
نویسندگان
چکیده
CCGbank is an automatic conversion of the Penn Treebank to Combinatory Categorial Grammar (CCG). We present two extensions to CCGbank which involve manipulating its derivation and category structure. We discuss approaches for the automatic re-insertion of removed quote symbols and evaluate their impact on the performance of the C&C CCG parser. We also analyse CCGbank to extract a multi-modal CCG lexicon, which will allow the removal of hardcoded language-specific constraints from the C&C parser, granting benefits to parsing speed and accuracy.
منابع مشابه
Creating a CCGbank and a Wide-Coverage CCG Lexicon for German
We present an algorithm which creates a German CCGbank by translating the syntax graphs in the German Tiger corpus into CCG derivation trees. The resulting corpus contains 46,628 derivations, covering 95% of all complete sentences in Tiger. Lexicons extracted from this corpus contain correct lexical entries for 94% of all known tokens in unseen text.
متن کاملIntegrating Verb-Particle Constructions into CCG Parsing
Despite their prevalence in the English language, multiword expressions like verb-particle constructions (VPCs) are often poorly handled by NLP systems. This problem is partly due to inadequacies in existing corpora; the primary corpus for CCG-oriented work, CCGbank, does not account for VPCs at all, and is inconsistent in its handling of them. In this paper, we apply some corrective transforma...
متن کاملCCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word–word dependencies. The resulting corpus,CCGbank,includes 99.4% of the sentences in the Penn Treebank. It is available from the Linguistic Data Consortium,and has been used to train widecoverage statistical parsers that ob...
متن کاملFully Lexicalising CCGbank with Hat Categories
We introduce an extension to CCG that allows form and function to be represented simultaneously, reducing the proliferation of modifier categories seen in standard CCG analyses. We can then remove the non-combinatory rules CCGbank uses to address this problem, producing a grammar that is fully lexicalised and far less ambiguous. There are intrinsic benefits to full lexicalisation, such as seman...
متن کاملParsing CCGbank with the Lambek Calculus
This paper will analyze CCGbank, a corpus of CCG derivations, for use with the Lambek calculus. We also present a Java implementation of the parsing algorithm for the Lambek calculus presented in Fowler (2009) and the results of experiments using that algorithm to parse the categories in CCGbank. We conclude that the Lambek calculus is computationally tractable for this task and provide insight...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007